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Abstract 

The  universal  relation  concept  is  intended  to  provide  the  database  user  with  a  simplified  model  in  which  he 
can  compose  queries  without  regard  to  the  underlying  structure  of  the  relations  in  the  database.  Frequently, 
the  lossless  join  criterion  provides  the  query  interpreter  with  the  clue  needed  to  interpret  the  query  as  the 
user  intended.  However,  some  examples  exist  where  interpretation  by  the  lossless- join  rule  runs  contrary  to 
our  intuition.  To  handle  some  of  these  cases,  we  propose  a  concept  called  maxima/  objects ,  which  modifies 
the  universal  relation  concept  in  exactly  those  situations  where  it  appears  to  go  awry — when  the  underlying 
relational  structure  has  “cycles.”  We  offer  examples  of  how  the  maximal  object  concept  provides  intuitively 
correct  interpretations.  We  also  consider  how  one  might  construct  maximal  objects  mechanically  from  purely 
syntactic  structural  information — the  relation  schemes  and  functional  dependencies — about  the  database. 

I.  Introduction 

We  assume  the  reader  is  familiar  with  basic  concepts  in  relational  database  theory,  such  as  functional  and 
multivalued  dependencies,  and  the  operators  of  relational  algebra,  particularly  projection  and  (natural)  join. 
[Ul]  or  [Mai]  contains  the  needed  background. 

We  also  expect  the  reader  is  familiar  with  the  notion  of  a  joiu  dependency  (JD),  which  is  expressed 
X  (fit,  JZ2t ... «  An)  where  each  /?»,  I  <  t  <  n,  is  a  set  of  attributes,  and  is  satisfied  by  a  relation  r  over 
R  =  U*-=lRi  if  and  only  if  the  join  of  the  projections  of  r  onto  the  R{S  is  r  itself.  Formally: 

r  =tx£_l  ir/j.(r) 

A  useful  notation  For  JD’s  was  introduced  in  [FMU].  For  CX)  (Ri}  R2) . . . ,  Rn)  we  construct  a  hypergraph 
(graph  in  which  “edges”  arc  arbitrary  sets  of  nodes  rather  than  doublctons  only)  as  follows.  For  each 
attribute  appearing  in  one  or  more  of  the  the  hypergraph  has  a  node.  For  each  Rif  the  hypergraph  has 
an  edge  consisting  of  all  the  members  of 

Also  studied  in  [FMU]  and  [BFMYj  was  a  subclass  of  the  JD’s — those  that  have  “acyclic”  hypoergraphs. 
The  term  “acyclic”  was  given  many  equivalent  definitions  in  these  two  papers;  here  we  shall  introduce  only 
one.  Wc  Graham -reduce  a  hypergraph  by  applying  the  following  rules  in  any  order  (the  process  is  Church- 
Itosscr,  so  order  doesn’t  really  matter). 

i)  Eliminate  a  node  that  appears  in  only  one  edge. 

ii)  Eliminate  an  edge  that  is  a  subset  of  another  edge. 

Then  a  hypergraph  (and  its  JD)  is  a  cyclic  if  and  only  when  it  reduces  to  nothing  by  Grab  am- reduction. 

It  is  the  goal  of  this  paper  to  contribute  to  the  utility  of  the  “universal  relation”  view  of  data. 
Interestingly,  a  number  of  papers  have  recently  been  written  to  argue  that  the  universal  relation  view  is 
insupportable  for  one  or  anther  reason  (Kl,  BG,  AP].  It  is  not  our  intent  to  argue  the  details  of  the  matters. 
We  shall  advance  only  one  argument  in  its  favor:  it  works;  it  may  not  be  perfect  for  everything,  but  it  does 
certain  things  well  enough  to  be  valued  by  its  users. 

In  particular,  a  universal  relation  system,  called  Systcin/Q  has  been  operating  successfully  at  Bell 
Laboratories  for  some  time  [A].  It  has  enabled  a  number  of  nontechnical  people  to  use  relational  database 
systems  with  little  effort,  while  just  as  we  would  expect,  the  “experts”  must  spend  considerable  effort 
preparing  the  system  to  work  on  each  database.  The  System/Q  approach  to  query  interpretation  is  to 
provide  a  “rcl  file,”  which  is  a  list  of  the  sets  of  relations  to  join  in  response  to  a  query,  in  order  of  preference, 
that  is,  given  a  query  that  mentions  a  set  of  attributes  X,  the  system  goes  down  the  rcl  file  until  it  finds 
a  set  of  relation  schemes  whose  union  includes  X,  and  it  then  takes  the  join  of  these  relations  and  answers 
the  query  ns  if  it  were  about  this  join.  Further,  [S]  has  recently  developed  a  universal  relation  system 
|  Supported  in  part  by  NSF  grant  jST-79-18264. 
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that  constructs  joins  in  response  to  queries  automatically,  based  on  the  theory  of  functional  dependencies 
and  lossless  joins.  In  fact,  this  strategy  of  query  interpretation  bears  considerable  similarity  to  the  method 
we  propose,  but  it  docs  not  make  use  of  multivalued  or  join  dependencies,  or  “declared  maximal  objects,” 
ingredients  we  consider  essential  to  exploit  the  power  of  the  universal  relation  concept.  Interestingly,  people 
at  Bell  have  contemplated  doing  automatic  generation  of  rel  files  by  a  method  related  to  that  of  Sagiv  (S). 

We  begin  with  the  hypothesis  of  Fagin,  Mcndclzon  and  Ullrnan  [FMU],  that  “real  world”  universal 
relations  can  be  described  by  one  join  dependency  and  a  collection  of  functional  dependencies.  They  argue 
that  if  the  universal  relation  over  a  set  of  attributes  Ait  A2,  ...»  An  has  meaning  at  all,  then  we  can  define 
it  by 

u  =  {  <  aid2. .  .'an  >  \PX  A  P2  A  ...  A  P*  }, 

where  each  P»  is  a  predicate  taking  some  set  of  the  oy’a  as  arguments  (some  of  these  a jfs  may  be  null). 

If  Pi  involves  ayt ,  oJa,  . . . ,  aJt>  then  the  set  of  attributes  R{  =  {  A^,  Aj2,  • »  }  is  said  to  be  an 

object ;  the  term  “object”  corresponds  closely  to  the  same  term  of  Sciore  [Sc]  and  is  borrowed  from  there.  In 
essence,  objects  are  sets  of  attributes  among  which  there  is  a  significant  connection.  It  is  proven  in  [FMUj 
that  u  can  be  constructed  in  this  way  if  and  only  if  u  satisfies  the  JD  CX3  (/?i  f  ,  . . . ,  Rk). 

Example  1:  Suppose  our  universal  relation  scheme  consists  of  the  attributes 

BNK  (bank) 

ACC  (account) 

L  (loan) 

C  (customer) 

AMT  (loan  amount) 

BAL  (account  balance) 

ADR  (customer  address) 

We  assume  the  functional  dependencies 

ACC  -  BNK  BAL 

L  -  BNK  AMT 

C  ->  ADR 

We  also  assume  that  the  universal  relation  is  defined,  in  terms  of  the  current  “real  world  facts”  as 

{  <buk,  acc,  I,  c,  amt,  bal,  adr>  |  ACCAT(bnk,  acc)  A  LAT(bnk,  1) 

A  OWN(acc,  c)  A  HOLD(l,  c)  A  HAS(acc,  bal)  A  F0R(1,  amt)  A  LIVES(c,  adr)} 

where  the  predicates  arc  defined  as 

ACCAT(x,  y)  =  account  y  is  at  bank  x 

LAT(x,  y)  =  loan  y  is  at  bank  x 

OWN(x,  y)  =  customer  y  owns  account  x 

HOLD(x,  y)  =  customer  y  holds  loan  x 

IIAS(x,  y)  =  account  x  has  balance  y 

FOR(x,  y)  =  loan  x  is  for  amount  y 

LrVES(x,  y)  =  customer  x  lives  at  address  y 


Each  of  these  predicates  uses  knowledge  about  the  present  state  of  the  real  world  to  constrain  the  set  of 
tuples  currently  appearing  in  the  universal  relation.  The  functional  dependencies  arc  facts  that  we  assume 
are  reflected  in  the  predicates.  For  example,  since  C-*  ADR  holds,  we  do  not  expect  LfVES(x,  y | )  and 
LlVI2S(x,  y2)  to  be  true  simultaneously  if  yt  y2.  However,  it  is  possible  that  the  attributes  of  an  FD  are 
contained  in  no  object,  in  which  ease  the  FD  is  still  “true”  but  its  cfTcct  is  not  so  easily  visible.  As  should 
be  obvious,  the  implementation  we  have  in  mind  for  a  universal  relation 

t*  —  {  <  ai«2. . . 4tn  >  \P\  A  Pi  A  •  •  •  A  Pk  } 

is  a  database  consisting  of  relations  r j,  1*2,  .  where  r,  is  a  relation  on  scheme  Jt,*,  and  R,-  is  the  object 
for  Pi.  (Perhaps  some  of  the  relations  r\,  r*,  . . . ,  r*  arc  not  explicitly  stored,  but  they  all  can  be  derived  by 
projection  from  stored  relations.)  We  have  a  tuple  in  r,  exactly  when  /\(f,)  is  true,  so  the  interpretation 
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of  each  relation  should  be  clear.  In  the  future,  we  shall  identify  the  relations  with  the  predicates,  and  talk 
about  relation  ACCAT(ACC,  UNK),  and  so  on.  [] 

Definition:  If  X  C  A1A2*  An,  the  connection  among  the  attributes  in  X1  denoted  [X],  is  defined  by 

[X]  =  *x(«) 

That  is,  [X]  is  the  projection  of  the  universal  relation  onto  the  attributes  in  X.[] 

Figure  1  shows  the  hypergraph  representation  of  the  seven  objects  of  our  example.  In  this  case,  each 
hyperedge  consists  of  two  attributes.  This  hypergraph  has  a  cycle,  which  implies  that  at  least  one  of  the 
attributes  is  semantically  overloaded;  it  stands  for  two  different  things.  While  we  cannot  say  for  sure  which 
attribute  is  overloaded,  since  the  database  is  most  likely  designed  from  the  bank’s  point  of  view,  we  shall 
consider  C  overloaded,  representing  customers  in  their  roles  as  depositors  and  borrowers.  We  shall  see  in  the 
next  section  how  this  overloading  causes  queries  to  give  an  intuitively  wrong  answer. 

II.  Queries  on  the  Universal  Relation 

We  shall  consider  the  common  form  of  query  on  the  universal  relation  that  can  be  expressed  by  relational 
operators  select,  project,  and  (natural)  join.  These  will  be  expressed  in  a  QUEL-like  notation  [SWKH,  Ul], 
but  without  range-statements,  since  all  tuple  variables  must  range  over  the  universal  relation.  The  format 
of  queries  we  use  is 

retrieve  < attribute  list> 
where  <  condition  > 

The  <attribute  list>  has  the  form  (ti-Ai,t2^2>  .  ..,tm.Am)i  where  the  t/s  are  (not  necessarily  distinct) 
tuple  variables  and  the  A,’«  arc  (not  necessarily  distinct)  attributes.  The  <condition>  is  built  from  operands 
that  are  constants  or  atoms  of  the  form  t.A,  for  tuple  variable  t  and  attribute  A,  using  arithmetic  comparison 
(=,  >,  >,...)  and  Boolean  connectives. 

The  meaning  of  the  query  is  defined  by  the  following  steps: 

Algorithm  1: 

1.  Take  the  cross  product  of  the  universal  relation  witfi  itself  p  times,  if  there  arc  p  distinct  tuple  variables. 
That  is, 

E|=uXvX  -Xu(p  times) 

Each  copy  of  u  is  said  to  correspond  to  one  particular  tuple  variable;  the  corrcsponder.ee  is  arbitrary. 

2.  Replace  each  occurrence  of  ti  by  the  join  ri  OO  r2  IX  •  •  •  CX  r*  of  the  relations  for  all  the  predicates 
P\,  p2t  • .  • ,  Pk*  The  result  of  the  substitution  is  expression  E 2-  The  justification  for  this  substitution  is 
that  given  a  universal  relation  u  exists,  and  assuming  as  in  [FMU]  that  the  JD  C*J(7?|,  .  ..,7?k)  holds, 
then  the  result  of  the  join  is  u.  In  practice,  the  r/s  may  not  be  projections  of  u  exactly;  there  may  be 
Mangling  tuples’*  that  do  not  contribute  to  the  join  r*  CX  ■  •  *  txj  r*.  We  shall  discuss  the  significance  of 
this  discrepancy  shortly. 
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3.  Apply  the  selection  operator  for  the  < condition >  in  the  wlicrc-clausc  to  EV  Every  atom  t.A  of  the 

< condition >  refers  to  a  unique  component  of  Ez,  the  component  for  attribute  A  in  the  copy  of  the 

universal  relation  corrcponding  to  t.  Let.  the  result  be  E 3. 

4.  Apply  the  projection  operator  for  the  list  of  components  mentioned  in  the  < attribute  Iist>  to  E3.  The 
result  is  an  algebraic  expression  £4. 

5.  Optimize  the  expression  under  “weak  equivalence,”  that  is,  find  a  minimal  expression  £5  that  is 
equivalent  to  E 4  under  the  assumption  that  the  r,’a  arc  the  projections  of  one  universal  relation.  For 
expressions  of  the  type  we  have  constructed,  assuming  reasonable  selection  conditions,  such  a  minimum 
always  exists  ([ASU],  [K 1])  and  can  be  found  efficiently . 

Intuitively,  the  last  step  throws  away  terms  from  the  join  if  they  are  not  necessary  to  connect  one 

or  more  attributes  in  the  query.  In  fact,  when  (and  only  when)  the  hypergraph  of  the  JD  defining  the 

universal  relation’s  structure  is  acyclic,  the  expression  £5  really  docs  invariably  find  the  minimal  lossless  join 
connecting  the  attributes  of  the  query  [MU].  The  fact  that  the  expression  £5  involves  as  few  joins  as  possible 
has  the  desirable  elTcct,  among  others,  of  ensuring  that  dangling  tuples  can  contribute  to  the  answer  as  long 
as  they  join  successfully  with  tuples  in  those  of  the  r,*’a  that  arc  acLually  involved  in  the  join. 

Example  2:  Consider  the  query  on  the  universal  relation  of  Fig.  1: 

retrieve(EC)  (<5j) 

where  t.C  =  ‘Jones’ 

This  query  asks  us  to  print  Jones*  address.  If  we  follow  Algorithm  1  we  find,  naturally  enough,  that  the 
expression  £5  involves  “joining”  only  one  relation  LIVES,  selecting  for  C  =  ‘Jones’,  and  projecting  the  result 
onto  ADR.  Notice  how  the  question  whether  the  tuple  or  tuples  with  C  =  ‘Jones’  in  LIVES  are  dangling  or 
not  never  coincs  up.  Even  if  Jones  does  not  appear  in  the  hold  or  has  relations,  or  for  some  other  reason, 
the  join  of  all  the  relations  includes  no  tuple  with  C  =  ‘Jones’,  our  response  to  Q\  has  been  the  intuitively 
correct  one.  Unfortunately,  when  the  hypergraph  defining  the  structure  of  the  universal  relation  is  cyclic,  as 
Fig.  I  is,  AJgorithm  1  can  give  intuitively  wrong  answers  to  queries,  primarily,  it  appears,  because  dangling 
tuples  arc  not  always  treated  properly,  but  also  because  the  minimal  connection  among  the  attributes  of  the 
query  will  not  necessarily  be  embodied  in  the  join  of  the  expression  £5  of  Algorithm  1. 

Example  3:  Consider,  for  the  same  database: 

retrieve(i.DNK)  (Q2) 

where  i.C  =  ‘Jones’ 

If  we  apply  Algorithm  1,  we  find  that  the  answer  to  query  Q 3  is  the  set  of  banks  where  Jones  has  both  a 
loan  and  an  account.  If  we  take  for  granted  that  the  meaning  of  Q3  is  the  set  of  banks  at  which  Jones  has 
either  a  loan  or  an  account,  and  arc  not  willing  to  incorporate  dummy  information  about  a  loan  when  Jones 
opens  an  account  at  National,  to  make  Qz  come  out  correctly,  then  we  conclude  that  Algorithm  1  does  not 
handle  Q3  properly.  The  problem  seems  to  be  that  if  Jones  has  only  a  loan  at  National,  the  tuples  that 
connect  Jones  and  National  are  dangling. 

Example  4:  The  following  query  is  similar  to  Q3: 

retricve(LACC)  (Qj) 

where  f.L  =  4-320. 

Query  Q3,  like  query  Qz>  jumps  across  the  diamond  of  Fig.  1.  Despite  this  similarity,  the  intended  meaning 
of  Qz  is  not  likely  to  be  “Print  the  accounts  that  are  either  at  the  same  bank  as  loan  4-326  or  arc  owned 
by  the  customer  who  also  holds  loan  4-326.”  In  fact,  it  isn’t  clear  that  Q 3  has  any  natural  meaning.  This 
example  points  up  the  fact  that  multiple  paths  connecting  attributes  can  be  a  source  of  ambiguity  for  systems 
trying  to  deal  with  universal  relations. 

HI.  Maximal  Objects 

Evidently,  we  need  some  black  magic.  This  magic  must  cause  Q 2  and  Qz  to  produce  the  correct  answers, 
and  it  must  be  sufficiently  powerful  to  distinguish  between  them,  since  they  appear  to  be  syntactically  the 
same  query.  The  magic  might  come  from  a  wave  of  the  semantic  wand,  such  a a  the  semantics  of  Codd  (Co), 
where  we  worry  about  how  attributes  represent  "entities”  and  “relationships,”  which  arc  concepts  rooted  in 
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what  real  world  the  database  represents.  We  would  prefer  not  to  rely  on  semantic  notions;  rather,  we  would 
like  purely  syntactic  ideas,  because  while  computers  tend  to  be  incapable  of  dealing  with  semantics  directly, 
they  grind  away  at  syntactic  calculations  quite  amiably  and  with  a  deal  of  efficiency  as  well. 

The  extra  syntactic  notion  we  propose  supplying,  in  addition  to  the  FD’s  and  objects,  is  a  collection 
of  sets  of  objects.  Each  set  of  objects  is  called  a  maxima/  object.  Intuitively,  the  maximal  objects  are 
the  largest  sets  of  objects  in  which  we  arc  willing  to  navigate.  For  practical  reasons,  every  object  must  be 
in  at  least  one  maximal  object.  We  associate  zero  or  more  maximal  objects  with  each  tuple  variable  of  a 
query — those  maximal  objects  that  contain  all  the  attributes  mentioned  by  the  query  in  connection  with 
that  tuple  variable. 

Example  5:  Figure  2  shows  the  two  maximal  objects  we  select  for  Fig.  1.  The  arrows  represent  FD’s,  but 
we  ignore  them  for  the  moment;  we  shall  use  them  when  wc  explain  how  maximal  objects  might  be  formed. 
There  arc  two  maximal  objects, 

{  C-ADR,  C-ACC,  ACC-BAL,  ACC-BNK  }  and 
{  C-ADIi,  C-L,  L-AMT,  L-DNK  } 

which  we  call  the  upper  and  lower  maximal  objects,  respectively.  [] 

Query  Q%  has  one  tuple  variable  i,  and  its  associated  attributes  are  C  and  BNK.  Both  attributes  are 
each  contained  in  both  maximal  objects.  We  are  willing  to  “navigate”  within  either  maximal  object.  We 
take  the  meaning  of  Q2  to  be  the  union  of  the  answers  we  get  by  evaluating  the  query  over  each  maximal 
object. 

When  wc  evaluate  with  respect  to  the  upper  maximal  object,  we  get  the  banks  at  which  the  customer 
has  an  account.  That  is,  the  optimization  step  in  Algorithm  1  leaves  us  with  the  join  of  the  ACCAT  and 
OWN  relations  only.  If  Jones  has  an  account  at  National,  then  wc  shall  be  told  that  fact  when  we  apply 
Algorithm  1  to  the  upper  maximal  object,  even  if  we  have  recorded  in  the  database  no  address  for  Jones, 
no  balance  for  any  of  hi9  accounts,  there  is  no  loan  by  National  to  Jones,  or  any  other  problem  arises  that 
would  technically  cause  Jones  and  National  not  to  be  related  in  the  universal  relation.  The  reader  must 
judge  for  himself  whether  this  is  a  reaonablc  response  by  a  universal  relation  system,  but  wc  believe  that  to 
be  the  case. 

When  we  evaluate  Q 2  in  the  lower  maximal  object,  wc  get  the  bauks  at  which  the  customer  has  loans. 
Thus  the  interpretation  of  Q2  »  the  set  of  banks  at  which  the  customer  has  either  an  account  or  loan,  as 
wc  intuitively  feel  it  should  be. 

Now  consider  query  Q3,  which  relates  accounts  and  loans.  These  two  attributes  occur  together  in  no 
maximal  object.  Thus  an  empty  set  of  accounts  should  be  produced  by  the  system,  or  better,  an  error 
message  saying  the  query  cannot  be  processed  or  is  ambiguous.  Wc  can  still  ask  for  the  accounts  held  by 
the  holder  of  4-326: 

retrieve(*.ACC)  (QA) 

where  t.C=s.C  and  s.L=4-326 

In  this  query,  the  attributes  connected  with  t  lie  in  the  upper  maximal  object  and  those  connected  with  s 
lie  in  the  lower  maximal  object.  Q4  gives  the  intuitively  correct  result.  A  similar  query  would  give  us  all  the 
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accounts  at  the  bank  making  loan  -1-326.  One  anomaly  we  face  concerns  trivial  queries  like: 

retrievc(t.C)  (Qs) 

where  t.C=t.C 

That  is,  print  all  the  customers.  If  we  work  in  the  upper  maximal  object,  for  example,  Algorithm  1  tells  us  to 
answer  the  query  by  taking  some  relation  whose  scheme  includes  C  and  project  it  onto  C.  The  result  of  Step 
(5)  in  Algorithm  1  is  ambiguous;  since  the  optimizer  assumes  weak  equivalence  of  expressions  is  sufficient, 

i.e.,  the  equivalence  of  expressions  depends  only  on  their  values  when  the  relations  to  which  they  apply  are 
the  projection  of  a  universal  relation.  If  that  were  the  ease,  we  would  indeed  get  the  same  result  whether 
we  projected  OWN  or  LIVES  onto  C. 

In  practice,  dangling  tuples  may  make  the  results  of  projections  from  OWN  and  LIVES  onto  C  different. 
The  maximal  object  concept  we  propose  does  not  deal  with  this  problem,  and  the  solution  probably  lies  in 
a  modification  of  Algorithm  1  to  take  the  union  of  all  relations  capable  of  producing  a  projection  that 
the  expression  minimizes  calls  for.  In  that  case,  the  response  to  Q 5  would  be  all  customers  mentioned  in 
any  of  OWN,  HOLD,  and  LIVES.  Let  us  now  formalize  our  notion  of  maximal  objects.  At  the  outset,  the 
reader  should  be  aware  that  maximal  objects  are  not  part  of  a  “data  model.”  Rather,  they  arc  parameters 
that  influence  a  particular  algorithm  for  query  interpretation.  Whether  or  not  they  produce  the  intuitively 
“correct”  interpretation  of  queries  in  all  situations  is  for  the  reader  to  judge.  We  can  only  give  examples, 
such  as  Q2  and  Q2i  where  the  method  seems  to  handle  hard  cases  properly,  and  we  shall  give  some  intuition 
that  supports  our  method,  to  be  described  later,  for  selecting  maximal  objects. 

Definition:  Let  m  =  {  R\t  R2, . . . ,  Rk  }  be  a  maximal  object,  Let  t/m  =  Ri  URtU  •••Urtfc,ictrl,r2t...,rfc 
be  the  database  relations  for  the  objects  Rit  R2,  • . .  , /?*,  and  let  um  =  rj  X  t2  X  *  *  •  Xl  r* .  If  X  is  a  set 
of  attributes,  the  connection  in  m  among  the  attributes  of  X,  denoted  [X,  m],  is  7rjf(um)  if  X  C  (Jm  and  0 
otherwise. 

If  M  =  {m1}  m2, . mq  }  is  the  set  of  maximal  objects  for  the  database,  the  connection  among  the 
attributes  in  X  in  the  database  is  given  by 

[X]  =  [X,  mi]  U  [X,  m2]  U  . . .  U  [X,  mf] 

This  definition  says  to  interpret  queries  as  if  we  had  a  universal  relation  ti  given  by: 

U  =  Um,  U  Uma  U  •  •  •  U  » 

where  each  um.  has  its  tuples  padded  with  nulls  to  be  over  the  universal  scheme. 

This  change  in  the  interpretation  of  the  database  indicates  we  should  modify  Algorithm  1,  in  order  to 
limit  the  range  of  navigation  for  tuple  variables  to  maximal  objects.  We  no  longer  construct  a  copy  of  the 
universal  relation  for  each  tuple  variable.  Instead,  we  find,  for  each  tuple  variable  t,  the  (possibly  empty) 
collection  Mt  of  maximal  objects  that  each  include  all  the  attributes  associated  with  t  in  the  query.  For 
each  m  in  Mt,  we  construct  the  relation  urn  as  in  the  definition  above.  We  then  let  t  range  over  all  tuples 
in  the  union  of  all  the  um’a  such  that  m  is  in  Mt •  We  formalize  this  construction  in  the  next  algorithm. 
Algorithm  2:  Given  query  Q  mentioning  tuple  variables  t*,  tj, . . . ,  t*,  and  given  maximal  objects  (sets  of 
objects)  mj,  m2, . . . ,  mqi  we  convert  Q  to  an  algebraic  expression  as  follows. 

1.  For  each  f,*,  let  X,  =  {  B  \  t{.B  appears  in  Q  }.  Let  A/,  be  set  of  maximal  objects  Mj  such  that  X,  C  Uj, 
where  Uj  is  the  union  of  objects  in  Mj . 

2.  For  each  maximal  object  m,*,  let  «/«  be  the  algebraic  expression  for  the  natural  join  of  all  the  relations 
on  objects  in  m,. 

3.  For  each  tuple  variable  tit  construct  the  algebraic  expression  X,  to  be  the  union  of  expressions  Jj  over 
all  j  such  that  my  is  in  M{. 

4.  Let  E2  =  X  K2  X  •••  X  X*. 

5.  Construct  £3  by  applying  selection  to  E2\  construct  E4  from  £3  by  applying  projection,  and  construct 
£5  from  £4  by  optimization,  exactly  as  in  Steps  3,  4,  and  5  of  Algorithm  1.  Q 

IV.  Automatic  Construction  of  Maximal  Objects 

We  shall  demonstrate  a  method  by  which  the  maximal  objects  of  Fig.  2  might  be  obtained.  In  general, 
there  is  probably  no  substitute  for  the  designer  looking  at  Clio  database  and  selecting  the  maximal  objects 


on  the  basis  of  what  makes  sense  to  him.  Nevertheless,  we  can  provide  an  algorithm,  actually  a  variety 
of  algorithms,  for  constructing  maximal  objects.  The  algorithms  are  based  on  the  principle  of  Aho,  Beeri 
and  U liman  [ABU]  that  a  join  of  relations  “makes  sense”  if  and  only  the  join  is  lossless.  While  the  sets  of 
maximal  objects  so  obtained  may  not  give  the  intuitively  correct  answers  to  queries  in  all  cases,  they  work 
in  many  eases,  and  arc  a  starting  point  for  the  database  designer. 

The  principle  behind  all  the  algorithms  is  that  we  start  with  a  single  object  and  “grow”  it  into  a  maximal 
object.  We  add  objects  to  the  maximal  object  being  constructed  so  long  as  the  join  of  the  relations  on  the 
objects  included  s  lossless.  That  is,  we  add  a  new  object  to  the  set  under  construction  only  if  its  relation  joins 
losslessly  with  those  for  the  objects  already  in  the  set.  The  difference  among  the  algorithms  for  maximal 
objects  lies  in  the  strength  of  the  rule  used  to  deduce  a  lossless  join.  We  assume  procedure  LOSSLESS(jft,  5) 
that  looks  at  global  information,  such  as  FD’s  and  the  set  of  all  objects,  and  returns  true  if  and  only  if  the 
join  of  the  relations  on  objects  R  and  5  is  lossless.  We  construct  maximal  objects  as  follows  in  Algorithm  3. 

It  should  be  noted  that  the  step  of  that  algorithm  which  finds  a  new  object  to  add  to  the  maximal  object 
MO  being  formed  is  in  a  sense  nondetcrministic,  in  that  we  allow  any  eligible  object  to  be  chosen.  In  many 
cases,  the  predicate  lossless  will  be  monotone,  in  the  sense  that  when  5  C  T,  we  have  LOSSLESS(/£,  5) 
implies  LOSSLKSS(R, T).  Then,  a  candidate  for  inclusion  in  MO  remains  a  candidate  even  if  another 
candidate  is  chosen  (selection  is  a  “Church- Rosser”  system),  and  the  result  of  the  algorithm  is  unique. 
Algorithm  3: 

MAXOBJ  :=  0; 
for  each  object  do 
begin 

MO  :=  {R}; 

S  :=  R; 

repeat 

find  an  object  T  not  in  MO  such  that  LOSSLESS(T,  5); 

MO  :=  MO  U  {7}; 


S  :=  S  U  7 

until  no  such  7  is  found; 

MAXOBJ  :  =  MAXOBJ  U  {  MO  }; 


We  consider  three  versions  of  LOSSLESS: 

1.  LOSSLESS(ft,  5)  =-=  true  if  and  only  if  (R  f)  S)—*R  or  (R  fi  S)-*S  (the  “FD’s  only”  rule). 

2.  LOSSLESS^,  5)  =  true  if  and  only  if  (R  n  5)-~(R  -  5)  |  (5  -  R)  (the  “MVD”  rule). 

3.  LOSSLESS(R,  5)  =  true  if  and  only  if  either 

a)  {R  n  S)-*R  or  {R  n  S)-+S,  or 

b)  (7?  O  5)—  »~>(R  —  5)|(5  —  R)  and  not  both  of  (R  —  S)—>(R  D  5)  and  (5  —  /?) — ►(/?  D  5)  are  true. 
This  rule  is  essentially  the  MVD  rule,  but  prohibits  navigation  through  “connection  traps”  [Co],  such 
as  from  loan  to  bank  to  account  in  Fig.  2. 

Rule  3  is  more  stringent  than  Rule  2,  although  more  liberal  than  Rule  1.  There  is  a  seeming  paradox 
with  Rule  3.  Not  knowing  about  a  vaid  FD  can  allow  it  to  create  larger  maximal  objects  than  if  we  recognized 
the  dependency.  The  motivation  for  Rule  3  is  that  in  the  absence  of  explicit  directions  to  the  contrary,  we 
conjecture  that  a  U3cr  docs  not  want  to  navigate  through  a  connection  trap.  That  is,  in  Fig.  2,  if  the  user  had 
in  mind  a  connection  between  loans  and  accounts,  it  would  more  likely  be  through  customer  than  through 
bank.  We  shall  let  the  reader  make  up  his  own  mind  whether  that  conjecture  is  true.  Even  if  Rule  3  is  the 
method  adopted  for  constructing  maximal  objects,  the  user  can  always  force  a  link  to  go  through  BNK  by 
a  two- tuple- variable  query  similar  to  Q*. 


Example  4:  The  maximal  objects  of  Fig.  2  are  constructed  using  the  FD’s-only  rule.  We  obtain  the  upper 
maximal  object  by  starting  with  object  ACC-C.  We  can  add  ACC-BAL  because  of  the  FI)  ACC— ►UAL.  We 
add  C-ADR  because  of  the  FD  C— >Al)lt.  BNK-ACC  is  added  because  of  the  FD  ACC— »BNK.  The  final 
maximal  object  includes  attributes  {ACC,  C,  BAL,  ADR,  BNK  }. 
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Continuing  the  analysis  of  Fig.  2,  the  lower  maximal  object  is  created  from  C-L  in  a  fashion  quite 
analogous  to  the  way  the  upper  one  was  created.  Starting  with  any  object  other  than  C-ACC  or  C-L  yields 
a  subset  of  the  two  maximal  objects  already  constructed,  so  they  arc  the  only  maximal  objects  produced. 

Either  of  the  other  two  rules  for  LOSSLESS  yields  the  same  maximal  objects  as  in  Fig.  2.  [] 

The  FD  rule  is  easily  seen  to  be  monotone,  while  the  other  two  rules  arc  not,  in  general.  However,  the 
MVD  rule  is  monotone  in  the  important  special  case  where  all  the  MVD’s  follow  from  the  given  JD  and 
FD’s.  Since  all  the  given  dependencies  are  full,  it  follows  from  the  “chase”  inference  method  of  [MMS]  that 
whenever  we  infer  an  embedded  MVD  X— +->Y  |  Zt  there  is  some  full  MVD  X—* ++Y9  |  Zf ,  where  Y  C  Yf  and 
Z  C  Z\  from  which  the  embedded  MVD  follows. 

In  that  case,  rule  (2)  says  L0SSLESS(/2,  S)  rs  true  if  and  only  if  R  fl  S  multi  determines  a  set  of  attributes 
that  includes  R  —  S  but  no  attribute  of  S  —  R.  In  fact,  since  we  start  with  a  single  JD,  the  test  can  be  made 
in  polynomial  time  [MSY]. 

There  is  another  important  fact  about  the  MVD  rule,  which  is  brought  out  in  the  following  theorem. 
This  result  says  that  when  the  JD  defining  the  universal  relation  structure  is  acyclic,  there  is  only  one 
maximal  object,  and  therefore  Algorithms  1  and  2  treat  queries  the  same  way. 

Theorem  1:  If  the  given  JD  has  an  acyclic  hypergraph  [FMU]  then  every  connected  component  of  the 
hypergraph  is  a  maximal  object  by  the  MVD  rule. 

Proof:  Wc  prove  the  result,  by  induction  on  the  number  of  edges  in  the  hypergraph.  The  basis,  one  edge,  is 
trivial.  Each  nontrivial  connected  component  contains  an  edge  (object)  E  with  an  attribute  not  present  in 
any  other  edge  and  whose  intersection  with  the  union  of  the  other  edges  is  contained  in  one  of  those  edges 
[BFMY].  If  we  remove  E,  the  component  remains  acyclic,  as  can  be  proved  easily  using  the  Graham  reduction 
test  of  [BFMY].  By  induction  on  the  number  of  edges,  wc  claim  there  is  an  edge  R  in  the  component  of  Et 
with  E  removed,  such  that  Algorithm  3,  started  with  Rt  produces  a  maximal  object  with  at  least  all  the 
edges  other  than  E  in  the  component.  Let  the  set  of  attributes  in  this  maximal  object  be  S .  The  given  JD 
implies  (S  D  E)— *-*{E  —  S)  \  (S  —  E)  [FMU].  Thus,  E  will  be  adjoined  to  S  in  Algorithm  3  to  form  a  larger 
maximal  object.  (] 

V.  Further  Considerations  and  Conclusions 

We  have  given  three  methods  for  constructing  sets  of  maximal  objects.  Only  experience  in  a  variety  of 
applications  will  show  which  method  constructs  maximal  objects  that  give  the  best  answers.  Of  course,  a 
database  designer  is  always  free  to  include  other  maximal  objects  to  make  queries  produce  intuitively  correct 
answers.  For  example,  if  the  designer  determined  that  the  connection  between  loans  and  accounts  for  query 
Qz  is  always  through  customer,  the  maximal  object  {C-ACC,  C-L}  could  be  added.  This  maximal  object 
would  let  Q 3  connect  L  and  ACC  through  C- 

Another  way  in  which  user-defined  maximal  objects  can  help  is  if  there  are  embedded  MVD’s  that  do 
not  follow  from  the  given  JD  and  FD’s.  For  example,  suppose  that  loans  could  be  made  by  consortiums  of 
banks,  so  the  FD  L— ►BNK  no  longer  held.  Then  any  of  the  three  methods  proposed  for  constructing  maximal 
objects  could  find  three:  {  C-ADR,  C-ACC,  ACC-BAL,  ACC-BNK  },  {  L-BNK,  L-AMT  },  and  {  L-AMT,  C-L, 
C-ADR}.  That  is,  the  lower  maximal  object  gets  split  in  two. 

Now,  the  response  of  Algorithm  2  to  Qi  is  to  print  only  the  banks  at  which  Jor.cs  has  an  account,  3*mcc 
only  the  upper  maximal  object  includes  all  the  attributes  of  t.  Wc  might  feel  that  that  answer  is  wrong, 
because  Jones  is  still  linked  to  all  the  banks  to  which  he  is  related  by  being  co- holder  of  a  loan  of  which 
the  bank  is  co-grantor.  If  one  believes  that  to  be  the  case,  then  one  is  really  assorting  the  embedded  MVD 
L— *-*BNK  |  C,  that  is,  all  banks  granting  a  loan  relate  to  all  customers  holding  the  loan. 

Instead  of  declaring  this  embedded  MVD,  which  leads  to  difficulties  when  we  try  to  interpret  queries 
by  inferring  lossless  joins  (sec  [MMS],  c.g.)  wc  would  simply  declare  the  lower  maximal  object,  even  though 
it  doesn’t  follow  from  any  of  the  construction  rules  we  have  proposed.  This  approach  effectively  substitutes 
maximal  object  declarations  for  certain  collections  of  embedded  MVD’s,  although  it  is  not  clear  to  what 
extent  it  enables  us  to  ignore  embedded  MVD’s  entirely  (except  those  that  follow  from  the  given  JD  or 
FD’s),  but  the  method  appears  promising. 

The  purpose  of  the  universal  relation  user  view  for  query  evaluation,  and  the  use  of  maximal  objects 
therewith,  is  to  remove  the  requirement  of  explicit  knowledge  of  the  database  structure  from  the  user. 
However,  the  sophisticated  user  could  use  knowledge  of  maximal  objects  to  his  advantage.  One  possibility 


is  to  allow  operands  in  the  < condition > — clause  of  a  query  of  the  trivial  form  t.A,  in  order  to  require  that 
tuple- variable  i  navigate  only  in  maximal  objects  containing  A.  For  example,  a  variant  of  query  Qt  is 

retrieve(MlNK)  (Q$) 

where  f.C=‘Jonos’  and  t.ACC 

Query  would  he  evaluated  by  letting  t  range  only  over  the  upper  maximal  object  in  Fig.  2.  The  query 
would  produce  just  those  banks  where  Jones  has  an  account. 

An  alternative  way  to  pass  some  navigation  control  to  the  user  is  by  using  aliases  for  some  of  the 
attributes  to  indicate  in  which  maximal  object  the  attribute  is  considered  to  lie.  For  example,  we  could  have 
DEI’  (depositor)  as  an  alias  for  C  indicating  the  upper  maximal  object,  and  BOH  (borrower)  as  an  alias  for 
C  indicating  the  lower  maximal  object.  With  these  aliases,  the  query 

retrieve(t.BNK)  (Q7) 

where  t.DFP^Jones* 

has  the  same  meaning  as  query  Q$  above. 

To  conclude,  we  note  its  a  consequence  of  Theorem  1,  if  the  given  JD  has  a  connected,  acyclic  hypergraph, 
then  the  maximal  object  concept  has  no  effect  when  the  MVD  rule  is  used.  This  should  be  the  ease,  as  a 
connected,  acyclic  hypergraph  implies  unique  connections  in  the  hypergraph  among  attributes.  Thus,  the 
universal  relation  idea  by  itself  appears  adequate  when  no  ambiguity  regarding  navigation  paths  is  present. 

It  is  only  when  cycles  occur,  as  in  Fig.  2,  that  the  need  for  maximal  objects  surfaces.  We  cannot  be 
certain  that  maximal  objects  are  more  likely  to  give  intuitively  correct  answers  than  the  pure  universal 
relation  interpretation  of  queries  (Algorithm  1).  However,  in  Section  IV  we  discussed  an  algorithm  that  finds 
maximal  objects  that  naturally  reflect  the  longest  paths  over  which  we  can  navigate  through  a  particular 
object  while  maintaining  a  lossless  join  of  the  relations  over  which  we  travel.  This  origin  for  maximal  objects 
lends  a  certain  plausibility  to  their  use. 
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